Text-Style Conversion of Speech Transcript into Web Document for Lecture Archive
نویسندگان
چکیده
It is very significant to the knowledge society to accumulate spoken documents on the web. However, because of the high redundancy of spontaneous speech, the faithfully transcribed text is not readable on an Internet browser, and therefore not suitable as a web document. This paper proposes a technique for converting spoken documents into web documents for the purpose of building a speech archiving system. The technique edits automatically transcribed texts and improves their readability on the browser. The readable text can be generated by applying technology such as paraphrasing, segmentation, and structuring transcribed texts. Editing experiments using lecture data demonstrated the feasibility of the technique. A prototype system of spoken document archiving was implemented to confirm its effectiveness.
منابع مشابه
Usable speech recognition
A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform satisfactorily when transcribing lectures. In this paper, we present an overview of the ePresence le...
متن کاملA Lecture-On-demand System using Spoken Document Retrieval
This research proposes a lecture-on-demand system, which retrieves video data in response to users’ information needs. For this purpose, we utilize texts and audio/video data for a single lecture. Our system extracts audio tracks from lecture video data, transcribes them by way of a large vocabulary continuous speech recognition system, and produces a lecture database. Users can selectively bro...
متن کاملPhonetic/linguistic web services at BAS
We present recent developments in the collection of phoneticlinguistic web services provided by the Bavarian Archive of Speech Signals (BAS). The BAS back end web services are REST based and can be easily integrated into user applications. Several public web interfaces have been implemented that utilize these back end services to provide easy-to-use access to high-end linguistic and phonetic pr...
متن کاملText to Hypertext Conversion with LATEX 2 HTML
LATEX2HTML is a conversion tool that allows existing documents written in LATEX to become part of a global multimedia system. This paper presents some of the reasons for using such a system and describes the basic conversion process. 1 World Wide Web A Global Multimedia System Imagine a system that links all the text, data, digital sounds, graphics and video on all the world’s computers into a ...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JACIII
دوره 13 شماره
صفحات -
تاریخ انتشار 2009